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1.  Overview 


The  goal  of  the  project  is  the  developmen;  of  a  shared,  large- 
scale  data  system  for  the  ARPA  community. 

The  system  may  be  viewed  as  a  box  that  performs  the  functions 
of  data  storage  and  data  management  on  behalf  of  multiple  com¬ 
puters  simultaneously  connected  to  the  box. 

The  box  contains  a  large-scale  tertiary  storage  device,  secondary 
storage  (disks)  for  staging,  and  a  medium-scale  computer  for 
performing  data  management  functions. 

Access  to  the  box  is  through  a  device-independent  notation, 
datalanguage.  This  language  is  being  designed  for  use  in  the 
Arpanet  as  a  standard  means  of  access  to  remotely  located  data. 

It  contains  features  specifically  designed  for  sharing  data 
among  programs  that  operate  on  different  machines,  for  describing 
a  broad  class  of  data  structures,  and  for  allowing  arbitrary 
subsets  of  large  files  to  be  selected  efficiently  at  run-time. 
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2.  Hardware  Installation 


A  PDP-10  system  was  delivered  to  CCA  late  in  the  last  reporting 
period.  This  system  was  checked  out,  and  regular  DEC  mainten¬ 
ance  was  begun  in  March.  Also  in  March  a  BBN  Model  701  Pager 
was  delivered  and  integrated  into  the  system.  We  are  expect¬ 
ing  delivery  of  a  TIP  early  in  the  next  reporting  period. 

With  the  addition  of  the  TIP,  the  installation  will  be  as 
shown  in  Fig.  1. 

System  performance  during  most  of  this  period  was  poor.  Prob¬ 
lems  arose  in  various  areas,  but  were  centered  on  the  ME10 
core  memories  (these  were  new  DEC  products,  replacing  the 
better-debugged  older  MAlOs)  and  the  disk  controller. 

Towards  the  end  of  this  period,  DEC  cooperated  by  providing 
on-site  personnel  daily  and  24-hour-a-day  on-call  maintenance. 
Furthermore  all  outstanding  ECOs  were  Installed,  and  a  major 
re-cabling  effort  took  place. 

A  new  policy  of  keeping  power  on  for  all  units  24-hours- 
per-day  was  Instituted.  Subsequently,  starting  In  July, 
performance  began  to  Improve  markedly,  and  became  satisfactory 
at  the  end  of  this  period. 

In  regard  to  overall  hardware  system  architecture.  Working 
Paper  No.  6,  "Dataccmputer  Hardware  Architecture”,  was  com¬ 
plete!  and  distributed.  An  activity  aimed  at  evaluating 
existing  tertiary  storage  devices  was  initiated;  results 
will  be  given  in  the  report  for  the  next  period. 


4 


3.  Software  Design  and  Implementation 

During  this  period,  the  software  system  design  reached  a 
fairly  stable  state  as  documented  in  Working  Paper  No.  5> 
"Datacomputer  Software  Architecture — Pe vision  1",  dated 
February  29,  1972,  which  is  included  here  as  Appendix  A. 

Regarding  software  implementation,  there  is  as  yet  little 
progress  to  report.  The  immediate  goal  is  the  generation  of 
a  complete, though  primitive,  system  in  time  to  give  a 
demonstration  at  the  ICCC  Conference  in  October.  The  results 
of  this  endeavor  will  be  discussed  in  the  report  for  the  next 
period. 


4.  Coordination  Activities 

4.1  Meetings  and  Conferences 

A  substantial  activity  has  developed  during  this  period 
dealing  with  technical  coordination  with  potential  users  of 
the  datacomputer  system,  providing  information  to  interested 
members  of  the  computer  silence  community,  government,  and 
industry.  In  addition  Lo  the  work  related  to  the  Weather 
Database  Working  Group  (see  below),  interaction  took  place 
with  U.  of  Illinois  (Center  for  Advanced  Computation), 

NASA/Ames  (Institute  for  Advanced  Computation),  RAND 
Corporation,  NIH,  National  Library  of  Medicine,  U.  of 
Michigan,  and  DOT.  A  technical  presentation  on  the  project 
was  given  to  the  IEEE,  Boston  Section .  on  May  23. 

A  significant  conference  was  held  at  NASA//mes  on  May  25  with 
representation  from  the  Illiac  IV  project  (M.  Plrtle),  ARPA 
(L.G.  Roberts)  and  CCA  (T.  Marill).  It  was  decided  that  the 
datacomputer  software  would  run  at  NASA/Ames  on  a  non-dedicated 
PDP-10/TENEX,  using  the  installed  UNICON  6? 0  for  tertiary 
storage.  At  CCA  the  system  would  continue  to  run  on  a  dedi¬ 
cated  machine,  to  offer  backup  for  the  Ames  system  through  the 
network,  and  to  offer  a  high-speed  direct-connection  option 
to  Boston-area  users. 

4.2  Weather  Database  Working  Group 

The  Weather  Database  Working  Group  (WDBWG)  had  been  set  up 
during  1971  with  the  mission  of  coordinating  plans  for  the 
loading  of  the  ETAC  weather  data  base  into  the  datacomputer 
system,  for  keeping  the  information  up-to-date,  and  for 
providing  access  to  Interested  groups. 
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The  second  meeting  of  WDBWG  took  place  in  Washington,  D.C. 
on  February  10  with  participation  from  CCA  as  well  as  ARPA, 
RAND,  ETAC,  AWS,  NCAR  and  NOAA.  It  was  decided  that  the 
analysis  and  upper  air  files  will  be  kept  on-line.  The 
mandatory  surface  data  will  be  broken  up  into  a  set  of 
chronological  station  files,  each  one  of  which  will  be  one 
datacomputer  file.  Date,  time,  block  and  station  numbers 
should  be  inverted. 
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Appendix  A 


Working  Paper  No.  5,  "Datacomputer  Software,  Architecture- 
Revision  1" ,  February  29,  19^2. 
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Datacomputer  Software 
Architecture 

Revisionl 


Datacompu  er  Project 
Working  Paper  No.  5 
February  29,  1972 


Contract  No.  DAH  4-71-C-001 1 
ARPA  Order  1731 


Computer  Corporation  of  Amarica 

575  Technology  Square 
Cambridge,  Massachusetts  02139 


Preface 


This  paper  discusses  the  concepts  and  the  functional  design 
cf  the  d.atacomputer  software.  It  is  a  revision  of  Working 
Paper  1  of  *  his  series,  and  presents  a  revised  architecture. 
The  most  ir portant  change  to  the  architecture  is  the  defini¬ 
tion  of  a  fifth  major  system  component:  the  directory 
system.  A  large  number  of  minor  changes  have  also  been  made, 
and  the  content  of  the  paper  has  been  reorganized. 

Other  papers  in  the  series  discuss  the  access  language,  the 
file  structures,  the  hardware  of  the  system,  and  related 
topics.  Further  pipers  will  be  issued  from  time  to  time. 

All  papers  are  subject  tc  revision  without  notice. 
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Chapter  1 
Introduction 


1.1  The  Datacomputer 

The  datacomputer  is  a  system  which  performs  data  storage  and 
data  management  functions. 

One  may  consider  the  datacomputer  as  a  black  box  with  multip 
physical  Dorts  to  which  processors  can  be  interfaced. 


Figure  1.1  -  The  Datacomputer 


which 


Each  of  the  processors  can  itself  have  multiple  users, 
can  avail  themselves  of  trie  services  that  the  datacomputer 
offers . 

Specifically,  these  services  are: 

1.  On-line  storage  of  uata  and  data  descriptions.  A 
data  file  can  be  unusually  large,  up  to  one  trillion  bits 
(roughly  the  equivalent  of  10,000  reels  of  magnetic  tape.) 

2.  Retrieval  of  data  (whole  files,  subsets  of  files, 
individual  data  elements). 

3-  File  maintenance  functions,  that  is,  addition  of 
new  data,  deletion  of  old  data,  changes  to  existing  data. 

.  Data  reformatting. 

5.  Backup  and  recovery  mechanisms,  for  use  in  case 

of  failure  in  the  datacomputer  or  in  one  of  the  user  systems. 

6.  Accounting,  for  allocating  charges  to  users. 

7.  Data  sharing,  allowing  the  same  data  bases  to  be 
accessed  by  different  users. 

8.  Data  privacy,  preventing  unauthorized  access  to 

data. 

9 •  Simultaneous  multi-user  access,  allowing  more  than 
one  request  to  be  serviced  simultaneously. 
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A  user  program  in  an  external  computer  interacts  with  the 
dotacomputer  only  through  datalanguage ,  a  system  of  nota¬ 
tion  developed  for  this  purpose.  This  increases  the  degree 
of  integrity  and  privacy  that  can  be  achieved  for  the 
stored  data,  and  improves  the  reliability  of  the  system. 

It  also  allows  users  the  convenience  of  working  with  a  tool 
specifically  designed  for  the  job  they  are  doing. 

The  aatacomputer  system  is  dedicated  to  data  management 
and  implemented  on  a  large  scale.  Thus  it  offers  more 
cost-effective  and  more  extensive  data  management  services 
than  systems  designed  primarily  for  other  purposes.  Its 
hardware  and  software  are  specialised  for  the  problems  they 
most  frequently  encounter.  On-line  storage  is  orders  of 
magnitude  cheaper  than  in  conventional  systems.  Data 
formats  are  flexible,  ana  the  variety  in  data  structure 
is  large. 
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1.2  Datacomputer  for  the  ARPANET 
The  datacomputer  for  the  ARPANET  has  two  physical  port 
shown  below: 


Fir, are  1.2  -  the  ARPANET  Datacomputer 
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Here,  the  IMP  is  connected  to  the  network  and  consequently 
allows  a  large  number  of  processors  to  access  the  data- 
computer  through  a  low-speed  (50,000  bits/second)  port. 
Users  of  the  ILLIAC  IV  have  access  at  data  rates  several 
orders  of  magnitude  higher  than  those  available  through  the 
IMP. 


Inside  the  datacornputer  box  are  a  modified  PDP-10  processor, 
a  BBN  pager,  several  core  memories,  disks,  interfaces  to 
the  IMP  and  ILLIAC,  and  a  Precision  Instruments  UNICON  690 
laser  mass  memory  system.  The  UNICON  contains  three  pro¬ 
cessors,  two  of  which  were  built  specifically  for  the  storage 
system.  The  software  for  these  processors  is  an  integral 
part  of  the  datacornputer  software,  and  is  outlined  in  Section 
3.4.  The  UNICON  has  an  on-line  storage  capacity  of  nearly 
one  trillion  bits.  It  also  has  the  ability  to  mount  and 
dismount  storage  packs  of  25  billion  bits,  giving  it 
unlimited  off-line  capacity  on  a  low-cost  medium. 

This  hardware  configuration  has  been  carefully  designed  and 
may  be  specialized  further  as  the  implementation  continues. 
However,  there  is  a  level  of  design  that  is  completely 
independent  of  the  configuration.  This  includes  the  access 
language,  most  of  the  data  storage ,  retrieval,  and  organiza¬ 
tion  techniques,  the  interfaces  of  tne  five  major  modules, 
anu  their  functional  design.  In  addition,  almost  all  of 
the  software  is  independent  of  the  mass  memory  system  used. 
This  point  is  particularly  important,  because  mass  memories 
are  expected  to  improve  considerably  over  the  next  few 
years.  Thus  additional  mass  storage  devices  can  be  accommo¬ 
dated  with  a  minimum  of  reprogramming,  and  the  entire 
configuration  can  be  changed  without  loss  of  most  of  the 
lesion  work. 
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1.3 _ Architectural  Overview 
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EXTENDED  OPERATING  SYSTEM 


Ficure  1.3  -  Architectural  Overview 


The  software  of  the  ciataeonnuter  has  fi^e  components 
request  handler,  the  storage  manager,  the  supervisor 
directory  system,  and  the  input-output  man aye r  (I/O) 
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STORAGE 
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The  storage  manager,  directory  system,  I/O  manager,  and 
supervisor  comprise  an  extended  operating  system  that 
supports  one  process  (i.e.,  one  job)  for  each  user  con¬ 
nected  to  the  datacomputer .  All  of  these  processes 
execute  the  same  program:  the  request  handler.  Each 
process  acts  independently,  contending  with  the  others 
for  the  resources  of  the  system,  and  is  concerned  only 
with  servicing  its  own  user.  The  processes  are  started 
and  stopped  by  the  operating  system  and  behave  somewhat 
like  user  jobs  in  an  ordinary  multi-programmed  computer. 

While  the  request  handler  is  concerned  only  with  the  user 
it  is  currently  servicing,  other  datacomputer  software 
modules  are  normally  concerned  with  the  entire  system. 

The  storage  manager  schedules  for  efficient  use  of  the 
storage  devices,  at  tjmes  degrading  the  service  to  one  user 
while  improving  the  service  to  others.  The  supervisor 
and  I/O  manager  have  scheduling  functions  which  they  carry 
out  with  a  similar  philosophy. 

The  functions  of  each  of  the  five  modules  are  described 
briefly  below. 


Request  Handler 

The  reouest  handler  processes  all  datalanguage ,  including 
data  descriptions ;  it  structures  data  for  storage  and  for 
output  to  users,  determines  and  executes  strategies  for 
the  accessing  of  stored  data,  and  performs  all  data  con¬ 
versions  . 
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Storage  Manager 

The  storage  manager  controls  the  allocation  of  datucompute r* 
storage  and  the  scheduling  of  all  storage  device  operations 
It  converts  logical  storage  addresses  to  physical  addresses 
and  converts  data  to  and  from  physical  storage  formats. 


Input-Output  Manager 

The  input-output  manager  provides  an  interface  between  the 
request  handler  and  the  outside  world.  To  the  request 
handler,  its  services  are:  standard  formats  for  all  input 
and  output  data,  standard  error  control,  and  standard  con¬ 
trol  of  connections  with  users.  It  accepts  output  from 
the  request  handler  whenever  it  is  generated,  and  accepts 
input  from  users  when  presented  with  it. 


Supervisor 

The  supervisor  schedules  the  use  of  the  central  processor, 
creates  and  ue'e^es  processes,  and  offers  a  variety  of 
services  normally  associated  with  a  multi-programmed 
operating  system. 


Directory  System 

The  directory  system  catalogs  data  descriptions,  file  names 
and  locations,  file  security  information,  and  some  account¬ 
ing  information. 
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Chapter  2 

The  Request  Handler 
2.1  Request  Handler  Function 

All  of  the  services  offered  by  the  datacomputer  are  imple 
rnented  in  the  request  handler.  The  user’s  datalanguage  1 
processed  and  acted  upon.  Data  Is  organized  for  storage, 
retrieved  on  demand,  and  formatted  for  output. 


2.2  Request  Handler  Point  of  View 

The  request  handler  acts  as  though  it  were  servicing  a 
single  user  at  a  single,  constant  data  rate,  and  working 
with  a  very  simply  and  uniformly  organized  storage  system. 
It  is  thus  independent  of  system  loading,  buffering, 
scheduling,  data  rate,  and  device-peculiar  data  access 
considerations.  It  conceives  data  access  strategies  based 
only  on  the  logical  organization  of  data. 

The  request  handler  aids  the  other  modules  in  their 
scheduling  tasks  by  generating  certain  information  for 
them.  For  example,  it  gives  advance  notice  to  the  storage 
manager,  when  possible,  of  its  requirements  to  access 
particular  data.  It  also  indicates  convenient  points  for 
interruption,  so  that  the  system  can  efficiently  switch 
to  other  tasks . 

The  request  handler  may  conceive  a  sub-optimal  strategy 
for  a  particular  request,  because  it  is  relatively  ignorant 
of  the  physical  organization  of  the  data.  In  this  case, 
however,  the  system  does  not  necessarily  execute  a  sub- 
optimal  strategy,  since  the  storage  manager  has  sufficient 
information  to  re-order  the  access  operations .  While  the 
end  result  is  not  infallibly  optimal  for  any  particular 
request,  it  is  globally  more  efficient  than  schemes  that 
fully  optimize  each  request  with  consequent  loss  of  con¬ 
trol  at  the  system  level.  The  present  scheme  also  has  the 
advantage  of  isolating  the  request  handler  from  the  con¬ 
siderations  mentioned  above. 
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2.3  Datalanguage 


The  activities  of  the  request  handler  are  best  defined  by 
datalanguage,  its  interface  tc  the  datacomputer  user.  In 
datalanguage,  data  is  described  and  referenced: 

A.  so  that  the  datacomputer  can  determine  access  paths 
and  optimize  the  access  process.  Normally  the  user  specifies 
the  name  or  content  of  the  data  items  desired,  leaving  the 
datacomputer  to  decide  how  to  retrieve  them.  This  is  a 
convenience  for  the  user  who  is  not  interested  in  the  details 
of  data  structures.  It  is  also  an  optimisation  for  the  usual 
case  where  the  user  is  unaware  of  the  actual  location  and 
storage  format  (for  example,  network  users  may  frequently 
be  unaware  that  their  data  is  even  on  the  datacomputer). 

b.  so  that  data  sharing  is  facilitated.  This  is 
accomplished  because  data  can  be  so  thoroughly  described 
that  the  datacomputer1  is  aware  of  the  format  of  the  data 
in  machine-independent  te^ms.  Thus  the  user  can  specify 
the  format  desired  and  leave  the  datacomputer  to  convert 
the  data  to  this  format  if  conversion  is  required. 

C.  in  a  concise,  natural  and  convenient  manner.  This 
is  accomplished  primarily  because  the  language  need  handle 
only  data  management  problems,  and  is  specialized  for  them 
(see  Working  Paper  No.  3) • 
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2.4  Data  Storage  and  Access  Techniques 

While  datalanguage  allows  users  to  program  their  own  access 
techniques,  most  will  prefer  to  use  the  ones  built  into  the 
system. 

Consider  the  problem  of  storing  a  large  number  of  weather 
observations,  say  500  million.  Such  a  database  might  re¬ 
quire  10  to  100  billion  bits  of  storage.  Even  if  the  data 
is  sorted  on  location  and  time,  the  problem  of  efficiently 
finding  the  data  of  interest  In  a  particular  problem  is 
not  simple.  A  typical  datalanguage  request  for  this  file 
might  ask  for  all  the  observations  from  a  particular  area 
showing  high  winds  and  warm  temperatures.  Without  special 
data  organizati^.  Techniques,  even  if  the  area  were  small, 
one  might  expect  to  examine  1  million  observations  to 
extract  the  re It van t  ones. 

There  are  only  two  ways  to  process  such  requests  efficiently. 
One  is  to  duplicate  the  data,  sorting  it  different  ways. 

This  is  impractical  because  of  storage  requirements.  The 
second  is  to  maintain  an  auxiliary  body  of  information,  that 
aids  in  answering  the  questions. 

The  organization  of  such  a  body,  the  application  of  it  to 
retrieval,  and  the  maintenance  of  it  across  changes  to  the 
aata  is  the  subject  of  a  working  paper  "File  Structures  And 
Access  Techniques".  The  basic  technique  employed  is  the 
use  of  inverted  files,  with  extensions  to  accommodate  files 
ordered  by  content,  range  retrievals.  Boolean  expressions, 
and  tree-structured  files.  The  techniques  are  relatively 
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insensitive  to  file  size,  and  are  expected  to  produce  good 
results  in  the  trillion  bit  range.  Other  common  techniques, 
such  as  exhaustive  sequential  search,  are  used  by  the  system 
when  the  more  complex  ones  do  not  apply. 

When  the  user  requests  the  observations  with  high  winds  and 
warm  temperatures,  he  need  not  be  aware  of  the  size  of  the 
database,  the  organization,  or  even  the  presence  or  absence 
of  an  inverted  file.  He  defines  in  datalanguage  the  con¬ 
tent  and  format  of  the  desired  data,  and  leaves  the  data 
computer  to  locate  it  the  best  possible  way.  Thus  he  is 
independent  of  all  structure  in  the  database  except  that 
whxch  is  basic  to  the  use  of  the  data,  or  that  which  affects 
performance  so  radically  that  it  determines  the  user’s 
behaviour  in  turn.  This  independence  gives  the  datacomputer 
maximum  freedom  in  organizing  data. 

Large,  shared  databases  like  the  weather  file  described 
here  will  be  reorganized  from  time  to  time  as  usage  pat¬ 
terns  and  requirements  alter.  These  reorganizations  will 
be  invisible,  except  for  their  effect  on  performance,  to 
users  th'nt  have  been  letting  the  datacomputer  determine 
access  strategies  for  them.  Thus  there  are  opportunities 
with  the  datacomputer  to  engineer  global  optimizations  in 
the  use  of  databases. 
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Chapter  3 

The  Storage  Manager 


3.1  Storage  Manager  Function 

The  datacomputer  storage  system  includes  core  memory,  con¬ 
ventional  direct  access  devices,  and  a  mass  storage  device. 
This  system  is  complex,  and  its  behaviour  and  configuration 
are  subject  to  change  during  the  life  of  the  software.  The 
storage  manager  has  the  function  of  using  this  system 
optimally,  while  presenting  a  stable  and  relatively  simple 
interface  to  che  request  handler,  directory  system,  and  I/O 
manager- . 


3.2  Storage  Manager  Concepts 

Storage  manager  users  can  operate  on  data  only  when  it  is 
in  a  buffer,  a  block  of  core  locations  of  fixed  size.  Each 
user  has  a  buffer  table,  containing  a  pointer  to  each  of 
his  buffers.  All  references  to  data  in  a  buffer  are 
indirect,  through  the  buffer  pointer.  Users  freely  access 
buffers  as  though  they  were  always  in  core.  In  practice, 
to  optimize  core  usage,  the  contents  of  the  buffers  are 
continually  being  moved  out  to  disk  and  back  to  core  again. 
This  movement  of  data,  and  the  corresponding  relocation  of 
the  buffer  pointer,  is  i‘  visible  to  the  user. 

All  data  are  stored  as  updatable  pages,  which  are  blocks 
of  data  large  enough  to  fill  one  buffer.  Each  page  has  a 
unique  identifier,  called  the  logical  page  address  (LPA), 
and  used  to  reference  it  in  commands. 

Via  commands  to  the  storage  manager,  users  can  create, 
delete,  and  a:cess  scratch  pages  and  data  pages.  Scratch 
pages  are  used  for  storage  of  temporary  and  intermediate 
results  that  cannot  conveniently  be  kept  in  the  available 
buffers.  Data  pages  are  used  for  pu^manent  storage  of  data. 

When  a  scratch  page  or  data  page  is  read  into  a  user’s 
buffer,  the  original  of  the  page  in  storage  ceases  to  exist 
for  most  purposes.  When  the  user  modifies  the  contents  of 
the  buffer,  he  is  now  modifying  the  original  and  permanent 
copy  of  the  page.  In  general,  other  requests  for  the  page 
are  satisfied  by  returning  a  pointer  to  the  same  copy.  This 
scheme  uses  buffer  space  economically  and  eliminates 
unnecessary  accesses  to  secondary  storage.  It  also  neces¬ 
sitates  careful  treatment  of  the  data  in  the  user’s  buffer, 
since  it  is  logically  the  original  of  the  page. 
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Consider  the  operation  of  extractin'?  half  of  the  da'ta  from 
a  page,  prefixing  a  count  to  it,  fc  '.d  passing  it  to  an  out¬ 
put  routine.  This  is  most  easily  done  by  reading  the  page 
into  a  buffer,  inserting  the  count  at  the  proper  position, 
and  calling  the  routine  with  a  pointer  into  the  buffer. 
However,  inserting  the  count  into  the  buffer  permanently 
alters  the  page,  because  the  data  in  the  buffer  is  the  only 
and  original  copy  of  it.  The  relevant  data  could  be  copied 
into  a  second  buffer,  and  there  prefixed  with  a  count.  This 
undesirable  solution  creates  two  physical  copies  of  the  data 
in  core,  when  typically  only  one  is  needed,  and  a  third  copy 
exists  somewhere  in  storage. 

This  problem  is  solved  by  defining  the  <COPY>  operation, 
which  makes  a  private  copy  of  the  contents  of  a  buffer.  The 
copy  of  the  data  in  the  users  buffer  loses  its  identity  as 
a  scratch  or  data  page.  It  now  has  the  same  status  as 
random  data  placed  by  a  user  into  a  newly  allocated  buffer. 
If  the  buffer  contains  the  only  physical  copy  in  the  system 
of  the  data,  then  the  <COPY>  does  the  in-core  copy-over 
required . 

(Mote:  In  practice,  the  association  between  private  page 

and  original  copy  is  maintained  until  the  private  page  is 
modified.  This  further  optimizes  the  use  of  storage  space 
un^  devices  in  certain  cases,  and  is  possible  because  of 
the  implementation,  which  uses  special  paging  hardware. 

In  the  general  design,  no  such  hardware  is  assumed,  and  the 
storage  manager  behaves  as  described  here.) 
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The  storage  manager  is  expected  to  perform  so  that: 

A.  buffers  declared  by  the  user  to  be  <IN-USE>  are 
normally  in  core  when  the  user  is  running. 

B.  other  buffers  are  available  after  (at  worst)  short 
delay . 

C.  retrieval  of  scratch  pages  "sually  requires  a 
short  delay. 

B.  retrieval  of  data  pages  can  usually  be  achieved 
with  only  short  or  medium  delays,  especially  when  intention 
to  access  them  has  been  stated  in  advance,  with  a  <PRE- 
READ>  operation.  (See  below) 

However,  not  all  data  pages  are  equally  accessible.  They 
are  organized  into  files,  and  the  initial  access  to  a  file 
may  occasion  a  longer  access  time. 

For  the  planned  configuration,  short  delays  are  10-200 
milliseconds,  medium  delays  are  200  milliseconds  to  1 
second,  and  long  delays  are  1-100  seconds. 
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3. 3  Storage  Manager  Interface 

The  storage  manager  accepts  the  following  commands  from  its 
users : 

ALLOC  <buffer  number> 

A  buffer  containing  all  zeroes  is  created  and  can  be 
referenced  by  its  buffer  number  in  subsequent  commands. 

Data  in  the  buffer  is  effectively  in  core  and  can  be 
operated  upon  directly  by  its  owner. 

REL  <buffer  number> 

The  buffer  is  disassociated  from  its  contents,  and  the 
buffer  number  becomes  available  for  re-use.  Until  the 
buffer  is  re-allocated,  references  to  locations  in  it  are 
in  error.  If  the  buffer  contains  a  data  or  scratch  page 
when  released,  that  page  must  not  have  been  modified  while 
in  the  buffer. 

DPR  <buffer  number,  data  page  LPA,  ROB> 

SPR  <buffer  number,  scratch  page  LPA,  ROB> 

A  buffer  containing  the  specified  page  is  created,  and  can 
be  referenced  by  the  buffer  number  in  subsequent  commands. 
Data  in  the  buffer  is  effectively  in  core  and  can  be 
operated  upon  directly  by  its  owner.  The  data  can  be  modi¬ 
fied  if  the  read-only  bit  (ROB)  is  zero. 
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DPW  <buffer  nuniDer,  data  page  LPA,  ROB> 


SPW  <buffer  number,  scratch  page  LPA,  ROB> 

If  the  buffer  contains  the  specified  page,  the  buffer  is 
disassociated  from  the  page  and  released  (see  REL,  above). 

If  the  buffer  does  not  contain  the  specified  page,  then 
the  buffer’s  contents  replace  the  page.  The  buffer  is 
then  disassociated  from  its  contents  and  released. 

The  read-only  bit  provides  a  means  to  request  error¬ 
checking  services  from  the  storage  manager.  It  can  be  set 
to  one  only  in  the  first  of  the  two  cases  explained  above. 
When  so  set,  it  requests  that  an  error  condition  be  raised 
if  the  page  has  been  modified.  This  is  identical  to  releas¬ 
ing  the  buffer,  except  that  it  also  insures  that  the  page 
in  the  buffer  has  the  LPA  supplied  in  the  command. 

COPY  <buffer  nuwber> 

The  contents  of  the  buffer  are  disassociated  from  any 
scratch  page,  data  page,  or  other  buffer.  The  buffer’s 
status  is  identical  to  that  of  a  newly-allocated  buffer, 
except  that  it  contains  the  same  data  as  it  did  before 
the  COPY.  Data  in  the  buffer  is  effectively  in  core  and 
can  be  operated  upon  directly  by  its  owner.  Subsequent 
changes  to  the  buffer's  contents  will  not  affect  the 
original  source  of  the  data. 

AJSIGN/DEASSIGN  <scratch  page  LPA  or  LPAs> 
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BLOCK 


It  is  a  convenient  time  for  the  user  to  be  blocked.  Thus, 
if  the  storage  manager  intends  to  block  thj s  user,  the 
block  should  occur  now.  Before  restarting  the  user  (if 
indeed  it  was  blocked),  the  storage  manager  would  probably 
insure  that  the  user’s  in-use  buffers  were  in  core. 

These  commands  assign  and  deassign  scratch  pages.  When  a 
page  has  been  assigned  it  may  be  used  in  a  SPR  or  SPW 
command . 


The  following  five  commands  are  for  the  purpose  of  inform¬ 
ing  the  storage  manager  of  a  user’s  intentions.  They  are 
not  required,  and  are  given  only  to  aid  the  storage  manager 
in  maKing  scheduling  decisions. 


PRE-READ  <LPA> 

The  user  intends  to  read  the  page  sometime  in  the  future 
and  the  storage  manager  should  be  prepared. 


POST-READ  <LPA> 

The  effect  of  a  previous  pre-read  command  for  the  page  is 
negated.  The  user  does  not  intend  to  access  the  page  in 
the  future. 
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IN-USE  <buf fer  #> 


The  process  intends  to  reference  the  indicated  buffer.  The 
storage  manager  will  attempt  to  keep  the  page  in  core. 

NOT-IN-USE  <buf fer  §> 

The  effect  of  previous  in-use  commands  is  negated. 
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3.  *4  Mass  Storage  Device 

The  storage  manager *s  main  function  is  to  manage  the  data- 
computer  storage  system  so  that  it  is  efficiently  used, 
while  the  rest  of  the  datacomputer  system  remains  largely 
independent  of  storage  hardware  considerations.  In  this 
section  a  short  discussion  of  how  this  is  accomplished  with 
the  UNICON  6 9 0  is  presented. 

Recall  that  the  storage  manager  users — the  request  handler 
and  directory  system — act  as  though  data  were  stored  on 
updatable  pages  of  fixed  size.  The  UNICON  stores  data  in 
fixed-size  units,  clockwords,  containing  data  bits,  which 
cannot  be  changed,  once  they  have  been  written.  The  stor¬ 
age  manager  simulates  updatable  pages  with  the  SOUPS  scheme. 
V/ith  SOUPS,  all  pages  are  regarded  as  initially  containing 
all  zeroes.  When  a  page  is  updated,  a  modification  record 
is  written.  The  modification  record  indicates  the  new 
contents  and  the  location  of  any  words  changed.  When  a 
page  is  read  from  storage,  all  modifications  are  read  in 
sequence,  into  an  initially  zeroed  buffer.  When  all 
modifications  have  been  processed,  the  buffer  contains 
the  latest  copy  of  the  page. 

SOUPS  requires  that  space  be  allocated  for  modification 
records,  that  too-f requently  modified  pages  be  rewritten 
in  a  new  location,  and  that  an  entire  clockword  be 
written  to  change  an  isolated  bit.  In  spite  of  these 
drawbacks,  it  is  appropriate  for  a  wide  range  of  applica¬ 
tions,  and  admirably  hides  the  write-once  property  of  the 
storage  medium. 
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The  response  properties  of  the  device  (access  times,  rota¬ 
tion  times,  etc.)  are  more  complex  than  those  of  a  disk  or 
drum.  The  volume  of  data  storage  is  a  strip,  containing 
1.7  billion  bits.  Strips  can  be  mounted  on  either  of  two 
rotating  drums,  in  which  case  they  can  be  read  ^r  written, 
or  they  can  be  in  the  carousel,  from  which  they  can  be 
mounted  without  human  intervention.  Worst  case  access  time 
for  data  on  a  mounted  strip  i..  iOC  .'billiseconds,  wh''*e 
strip-changing  time  is  as  high  as  10  seconds.  Thus  it  is 
vital  to  minimize  strip  changes. 

Recall  now  how  the  storage  manager  is  to  present  the  access 
properties  of  the  storage  system  to  its  users.  The  set  of 
ail  data  pages  is  partitioned  into  logical  units  called 
files.  Pages  within  a  file  are  considered  equally  access¬ 
ible.  Since  the  request  handler  recognizes  that  crossing 
file  boundaries  should  be  minimized,  it  can  help  to 
minimize  strip  changes,  without  being  i;ied  to  the  concept 
of  a  strip  or  a  strip  mount. 

Also,  the  storage  manager  can  most  effectively  minimize 
strip  mounts  for  the  system  by  making  use  of  PRE-READ 
information,  which  it  gets  from  all  its  users,  and  by 
using  its  enormous  disk  buffer.  For  example,  a  process 
requesting  inpur  can  run  until  it  needs  a  page  that  is  not 
available  on  di.’k  or  on  the  mounted  strip,  and  can  then 
block.  When  it  is  again  started  up,  most  of  its  pages  will 
have  already  migrated  to  disk  from  the  UNICON. 
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The  PRE-READ  system  is  least  effective  for  processes  that 
cannot  determine  the  addresses  of  tin  pages  they  want  without 
first  looking  at  some  other  pages  on  the  same  strip.  For 
this  case,  the  strip-changing  algorithm  can  compensate  some¬ 
what  by  not  dismounting  a  strip  until  the  process  that 
requested  it  gets  some  opportunity  to  generate  more  requests. 
Anoth  .*r  approach  is  to  reflect  in  the  PRE-READ  information 
the  intent  to  read  more  pages  from  the  same  file  after  the 
first  request  is  satisfied. 

To  summarize,  the  UNICON,  which  is  unusual  in  ics  response 
characteristics  and  storage  medium,  is  accommodated  in  the 
storage  manager  with  a  special  implementation  of  the  idea 
of  page,  and  a  standard  interpretation  of  the  idea  of  file. 
The  PRE-READ  information  and  large  buffer  space  further 
reduce  the  difficulties  of  hiding  the  device  characteristics. 


Chapter  4 
The  I/O  Manager 


4.1  Function 

The  I/O  manager  is  the  datacomputer * s  interface  with  the 
outside  world.  All  data,  and  all  requests  for  service, 
initially  enter  the  datacomputer  through  the  I/O  manager. 
Thus  on  the  outside,  the  I/O  manager  has  the  problem  of 
interfacing  with  a  variety  of  hardware  devices,  data 
formats,  and  software  systems.  On  the  inside,  it  services 
the  datacomputer  system  by  providing  a  standard  protocol 
for  connection  control.  It  also  buffers  data  when  neces¬ 
sary,  to  make  the  rate  of  data  flow  convenient  to  data¬ 
computer  users  and  to  the  request  handler. 


37 


-25- 


L\.2  Inside  Interface 

A  caller  can  give  four  commands  to  the  I/O  manager:  GET, 
PUT,  OPEN  and  CLOSE.  The  cal.er  is  usually  the  request 
handler . 

OPEN  establishes  a  connection  between  the  caller  and  a 
source  of  data  in  the  outside  world.  The  source  can  be  a 
program  in  another  computer,  a  file  of  data  on  tape,  etc. 

The  caller's  end  of  the  connection  is  a  logical  port.  OPEN 
returns  a  port  number,  which  the  caller  can  use  to  reference 
the  port  established. 

CLOSE  terminates  such  a  connection,  and  makes  further 
references  to  the  logical  port  invalid. 

GET  causes  the  I/O  manager  to  pass  one  block  of  data  or  one 
item  of  control  information  to  th  caller. 

PUT  passes  one  block  of  data  or  one  item  of  control 
information  to  the  I/O  manager  from  the  caller. 

The  control  information  passed  in  GETS  and  PUTS  is  data 
stream  punctuation.  An  example  is  the  inter-record  gap, 
when  the  physical  tape  record  boundary  has  logical  signi¬ 
ficance.  Here  the  inter-record  gap  acts  as  punctuation, 
yet  it  is  not  data  in  the  same  sense  a^~  the  data  in  the 
record.  If  the  I/O  manager  reads  an  inter-record  gap  on 
a  tape  file,  an  item  of  control  information  is  passed  to 
caller.  If  the  caller  wishes  an  inter-record  gap  to 
be  v ritten  on  a  tape  file,  he  must  pass  an  item  of  control 
rmation  to  the  I/O  manager.  (Note.  This  is  true,  of 
course,  only  when  the  inter-record  gap  has  logical 
significance . ) 
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e  control  information  passing  between  I/O  manager  and 
waller  is  communicated  by  means  of  GETs  and  PUTs ,  and  must 
oe  synchronized  with  the  data  stream  itself.  A  GET  or  PUT 
passes  either  a  block  of  data  or  a  control  information 
item.  Thus  a  block  of  data  can  contain  no  control  informa¬ 
tion.  Aside  from  this,  there  are  no  restrictions  on  the 
size  of  the  block,  except  implementation-dependent  ones. 

For  design  purposes,  a  block  can  be  dr  "ined  as  a  convenient 
chunk  of  data  containing  no  control  items. 
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4.3  Outside  Interface 

The  outside  interface  of  the  I/O  manager  is  defined  in  terms 
of  the  datacomputer  implementation.  Initially,  the  data- 
computer  system  will  be  interfaced  to  the  ARPA  network  and 
the  ILLIAC  IV.  In  addition,  the  I/O  manager  will  support 
magnetic  tape  as  a  means  of  data  input/output,  and  local 
teletype-compatible  terminals  for  debugging. 

The  software  interface  to  the  ILLIAC  IV  is  under  study  at 
the  time  of  this  publication. 

The  software  interface  to  the  ARPA  network,  at  the  I/O 
manager  level,  is  defined  by  the  oificial  network  proto¬ 
cols,  NIC  7104.  Below,  some  especially  relevant  features 
of  these  protocols  are  discussed. 

On  the  network,  a  user  becomes  connected  to  the  datacomputer 
by  executing  the  Initial  Connection  Protocol.  As  a  result 
of  this  he  has  two  half-duplex  connections.  On  one  he  can 
ser  d  messages  to  the  datacomputer,  and  on  the  other,  the 
datacomputer  can  send  messages  to  him.  The  messages  are 
formatted,  and  message  flow  is  controlled,  according  to 
HOST-HOLT  Protocol.  Each  message  r.as  a  header  and  text. 

The  header  identifies  the  connection,  states  the  length  of 
the  text,  and  supplies  other  miscellaneous  information.  The 
text  contains  an  integral  number  of  bytes  of  data,  where 
byte  size  is  a  parameter  determined  during  the  initial 
connection  procedure. 

The  boundaries  of  messages  have  no  logical  significance, 
so  the  concatenation  of  the  texts  of  all  the  messages  sent 
over  a  single  connection  constitutes  a  single  string  of 
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The  Data  Transfer  Protocol  is  used  to  partition  the  byte 
stream  into  logical  units.  A  byte  stream  formatted 
according  to  Data  Transfer  Protocol  becomes  a  stream  of 
transactions.  There  are  several  types  of  transactions, 
of  which  three  are  of  interest  here: 

A.  The  data  transaction,  which  is  used  to  senu  all 

data . 

B.  The  control  transaction,  which  is  used  to  send 
all  datalanguage  and  diagnostics. 

C.  The  information  separator  transaction,  which  is 
used  to  make  logical  unit  boundaries. 

To  transmit  a  group  of  data  records  to  the  datacomputer , 
each  record  might  be  formatted  as  one  or  more  data  trans- 
actions.  Following  each  record  there  would  be  an  informa¬ 
tion  separator,  marking  the  end  of  the  record.  The  end 
of  the  group  would  be  indicated  with  a  higher  level 
separator . 

Data  Transfer  Protocol  allows  the  I/O  manager  to  distinguish 
between  datalanguage  and  data  on  the  same  connection,  and 
to  identify  logical  units  and  groups  of  logical  units.  The 
former  ability  is  significant  for  detection  of  user  errors 
..nd  synchronization  of  user  program  and  datacomputer 
recovery.  Tne  latter  is  useful  for  scheduling  within  the 
datacomputer. 

Magnetic  tape  is  viewed  as  a  data  I/O  device.  That  is, 
controlling  datalanguage  is  assumed  to  be  coming  from 
another  source. 
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The  tape  medium  has  two  kinds  of  information  separators: 
inter-record  gap  and  end-of-file  marker.  In  addition,  tape 
contents  can  have  structure  in  the  form  of  labels,  record¬ 
ing  formats,  and  even  directories.  Any  such  structures  that 
are  not  to  be  described  explicitly  in  datalanguage  and  pro¬ 
cessed  by  the  request  handler,  must.be  "understood"  by  the 
I/O  manager.  To  make  the  problem  simpler  at  the  outset, 
the  I/O  manager  will  handle  a  relatively  small  number  of 
tape  formats.  Since  most  data  is  being  transferred  in  or 
out  over  the  network  or  over  the  ILLIAC  IV  interface,  this 
restriction  is  deemed  unimportant. 
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b.U  Internals 

In  terms  of  the  system  architecture,  the  I/O  manager’s  most 
interesting  problem  is  buffering.  Except  when  the  amount 
of  buffered  data  for  a  certain  connection  becomes  excessive, 
or  when  the  system  is  dedicating  too  much  space  to  I/O 
buffering,  the  I/O  manager  will  accept  all  the  input  with 
which  it  is  presented.  This  means: 

A.  to  most  users,  the  datacomputer  appears  to  accept 
data  as  fast  as  they  can  send  it 

B.  to  the  request  handler,  most  users  appear  to 
accept  data  as  fast  as  ic  (the  request  handler)  can  gene¬ 
rate  it. 

C.  to  the  request  handler,  some  users  appear  to 
produce  ^ata  at  the  rate  of  the  datacomputer  storage 
system. 

These  three  effects  are  extremely  desirable.  For  the  user 
interfacing  with  the  datacomputer  at  a  data  rate  below  that 
of  the  datac  )mputer  storage  system,  the  first  is  helpful 
because  it  enables  him  to  transfer  a  given  amount  of  data  to 
the  datacomputer  in  the  minimum  possible  time.  For  moderate 
amounts  of  data,  this  will  be  independent  of  the  amount  of 
processing  the  datacomputer  must  do  prior  to  storage  of  the 
data.  Likewise,  it  will  be  independent  of  the  loading  of 
the  datacomputer.  (This  effect  is  not  achieved  for  the 
ILLIAC  IV  system,  which  can  sustain  data  rates  of  one 
billion  bits  per  second  for  up  to  one  second.  However,  the 
I/O  manager  will  accept  input  from  the  ILLIAC  IV  up  to  the 
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data  rate  of  the  datacomputer  storage  devices.  Currently 
this  is  limited  to  3*3  million  bits  per  second,  but  studies 
have  shown  that  there  is  a  feasible  way  to  raise  this  to 
*10  million  bits  per  second  for  bursts  up  to  several  billion 
bits .  ) 

The  second  and  third  effects  are  useful  because  of  the  way 
in  which  the  datacomputer  operates,  and  the  conditions  it 
will  encounter.  To  service  a  particular  user  a  certain  set 
of  buffers  belonging  to  that  user,  the  working  set,  must 
normally  be  in  core.  If  any  of  these  buffers,  which  are 
all  frequently  referenced,  are  not  in  core,  then  very  little 
service  will  be  given  before  the  request  handler  will 
reference  one  that  must  be  read  in  from  disk.  The  working 
set  is  established  dynamically,  and  migrates  from  disk  to 
core  when  the  user  is  getting  some  service.  Optimal  core 
usage  is  achieved  when  the  request  handler  process  gets  its 
working  set  in  core  and  then  interfaces  with  the  user  at 
storage  system  rates.  These  rates  are  of  course  achieved 
when  the  I/O  manager  has  queued  some  input  or  when  the  I/O 
manager  is  accepting  and  queueing  all  output  at  the  rate 
the  request  handler  generates  it. 

The  buffering  is  implemented  by  queueing  the  data,  by 
destination,  initially  in  core  and,  when  required,  on 
scratch  pages.  Queueing  on  scratch  pages  involves  an  in- 
core  copy  operation  to  pack  the  data  onto  the  pages,  except 
when  the  unit  being  added  to  the  queue  is  a  full  page  of 
data.  In  this  case  the  storage  manager  is  asked  to  copy 
the  page,  which  it  can  usually  accomplish  without  the 
physical  copy  operation. 


44 


-32- 


Fcr  some  devices,  re  formatting  of  the  data  stream  is  re¬ 
quired  between  useful  request  handler  processing  and  the 
device  I/O  operation.  An  example  is  a  tape  drive  trans¬ 
mitting  36-bit  werds  (this  is  a  problem  because  all  request 
handler  data  is  m  32-bit  words).  Here  the  I/O  manager 
must  do  a  shift-ahd-copy  operation,  which  it  will  combine 
with  formatting  the  data  on  a  queue  page,  when  that  opera¬ 
tion  takes  place.  when  no  queue  is  forming,  a  copy  is 
forced  in  this  case.  In  the  more  normal  cases,  however, 
there  are  one  or  2ero  copy-overs  in  the  I/O  manager  if  the 
data  gets  queued,  and  none  i:  it  doesn't. 

The  actual  response  of  the  core  management  system  in  any 
real  situation  is  determined  dynamically  according  to  the 
scheduling  heuristics  in  the  storage  manager.  The  previous 
discussion  is  presented  only  to  suggest  the  advantages  that 
can  be  obtained  by  appropriate  buffering  in  the  I/O  manager. 
Since  both  the  size  cf  queues  and  the  management  of  all 
core  is  the  function  cf  the  storage  manager,  the  I/O  manager 
is  oily  concerned  with  formatting,  building  and  maintaining 
the  queues. 


45 


-33- 


Chapter  5 
The  Supervisor 


5.1  Function 

The  supervisor  schedules  the  central  processor,  creates  and 
deletes  processes  and  performs  some  miscellaneous  operating 
system  functions.  Among  these  are  system  bootstrap  loading, 
operator  communication,  and  management  of  the  clock  and 
priority  interrupt  system. 

In  scheduling  the  CPU,  the  supervisor  determines: 

A.  When  to  interrupt  a  running  process  for  the  pur¬ 
pose  of  giving  another  process  the  chance  to  use  the  CPU. 
Many  processes  are  designed  to  interrupt  themselves  at 
convenient  times,  but  this  is  not  always  adequate. 

B.  Which  waiting  process  should  run  next,  when  the 
last  running  process  has  stopped. 

C.  When  there  are  too  many  or  too  few  processes  con¬ 
tending  for  the  CPU  and  core.  When  such  a  determination 
has  been  made,  the  storage  manager  is  informed  and  tries  to 
rectify  the  situation  by  swapping  processes  in  or  out . 
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5.2  Design 

Because  the  functions  of  the  supervisor  are  so  basic  to  any 
level  of  datacomputer  system  operation,  the  supervisor  is 
more  rigorously  engineered,  with  respect  to  crash/recovery, 
than  the  other  four  modules.  When  any  of  the  other  modules 
crash,  the  supervisor  obtains  control  and  initiates  recovery 
procedures.  While  occasionally  a  non-supervisor  crash  can 
have  severe  short-term  consequences,  such  crashes  are  not 
generally  as  serious  as  supervisor  crashes,  in  which  reload¬ 
ing  the  entire  system,  human  intervention,  and  (conceivably) 
loss  of  most  or  all  temporary  information  is  implied. 

Thus  extreme  care  is  taken  in  allotting  functions  to  the 
supervisor.  It  is  constrained  to  be  simple  in  design,  to 
retain  control  under  all  conditions  its  designers  can 
conceive  of,  and  to  delegate  to  other  modules  any  func¬ 
tions  not  absolutely  basic  to  its  function. 
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5.3  Implementation  Strategy 

The  supervisor  will  evolve  during  the  implementation  of  the 
datacomputer .  Initially  it  will  consist  of  appropriate 
modules  from  ">n  existing  operating  system.  In  this  stage, 
scheduling  algorithms  may  not  be  optimal  for  the  data¬ 
computer.  When  a  primitive  datacomputer  has  been  established, 
design  and  implementation  of  the  real  supervisor  will  be 
undertaken.  The  design  effort  will  then  be  based  on  some 
useful  experience  and  on  a  better  understanding  of  the  data¬ 
computer  system  than  is  possible  presently. 
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Chapter  6 
Directory  System 


6.1  Function 

The  directory  system  maintains  data  descriptions,  file 
names  and  locations,  file  security  information  and  some 
accounting  information. 

In  some  respects  the  directory  system’s  functions  are 
similar  to  the  request  handler's,  and  the  directory  system 
could  probably  be  implemented  in  datalanguage .  However, 
the  directory  system's  database  is  of  such  crucial  importance 
to  the  operation  of  the  daf acomputer ,  that  it  is  separated 
from  the  request  handler  for  the  sake  of  reliability.  The 
directory  system  can  be  expected  to  stabilize  and  undergo 
little  change  during  the  second  half  of  the  implementation 
of  the  request  handler.  During  this  period,  it  will  be 
important  to  isolate  the  file  directory  from  any  bugs  that 
occur  in  the  request  handler. 

When  the  full  datalanguage  has  been  implemented,  the  request 
handler  could  indeed  become  as  stable  as  the  directory  system. 
At  this  point,  integration  is  undesirable  for  other  reasons. 
For  one  thing,  further  development  of  the  dataccmputer  con¬ 
cept  may  at  this  stage  have  implies  ,ions  for  the  directory 
system.  With  multiple  datacomputers  in  the  same  system,  data 
disposition  among  the  datacomputers  is  something  that  should 
probably  be  settled  by  the  datacomputers  in  a  fashion 
invisible  to  their  users.  Under  these  circumstances,  the 
directory  systems  of  the  several  datacomputers  must  either 
cooperate  or  be  combined  inrc  a  single  directory  system. 
Inter-system  problems  should  probably  be  handled  by  the 
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directory  systems  and  storage  managers  and  hidden  from  the 
request  handlers. 

Thus  the  directory  system  has  different  reliability  re¬ 
quirements  and  will  probably  eventually  have  decidely 
different  functions  than  the  request  handler. 
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6.2  The  File  Directory 

Most  of  the  directory  system’s  data  is  stored  in  the  file 
directory,  and  a  major  part  of  its  job  is  the  maintenance 
and  use  of  this  directory. 

The  datacomputer  file  directory  is  unusual  in  both  size  and. 
content.  Just  the  on-line  storage  will  give  rise  to  an 
extremely  large  file  directory  by  conventional  standards. 
(To  get  a  feel  for  this,  consider  the  on-line  storage  as 
the  equivalent  of  10,000  tapes.)  The  off-line  storage  is 
unlimited,  and  must  be  kept  in  the  directory  also.  In 
content  the  directory  is  unusual  because  it  contains  not 
only  file  names  a.*d  pointers,  but  datalanguage  descrip¬ 
tions,  which  are  used  in  the  compilation  process.  The 
distinction  between  files  and  other  kinds  of  data  con¬ 
tainers  (like  records,  trees,  fields,  groups  of  records, 
etc.)  is  somewhat  blurred  in  datalanguage,  and  this  may 
give  rise  to  heavier  usage  of  the  file  directory  than  in 
other  systems.  This  point  is  most  easily  understood  In 
terms  of  the  structure  of  the  directory,  explained  in  the 
remainder  of  this  section. 

The  directory  is  a  tree  structured  file  in  whiuh  each  node 
of  the  tree  is  a  record.  If  me  record  points  to  other 
records  it  is  a  directory  (or  sub-directory)  node.  The 
ends,  or  leaves,  of  the  tree  are  descriptor  nodes  that  con¬ 
tain  a  data  description.  A  record  that  describes  a  file 
'^ay  be  referred  to  as  a  "file"  node — a  special  case  of  a 
descriptor  node.  Each  descriptor  node  has  a  name — the  name 
of  the  highest-level  data  container  described.  Names  below 
this  level  are  not  known  to  the  directory  system. 
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Each  record  has  a  "node  name"  that  is  a  character  string 
without  any  blanks  or  other  special  characters.  Each  re¬ 
cord  also  has  a  "path  name"  that  is  the  concatenation  of 
all  the  node  names  (with  periods  in  between)  of  the 
directory  records  traversed  in  order  to  arrive  at  the 
selected  node.  Path  names  will  be  unique.  Node  names  may 
be  repeated. 

It  is  expected  that  the  higher  nodes  will  correspond  to 
organizations,  projects  ana/or  people,  but  the  directory 
system  will  not  assume  this.  The  path  name  for  a  file's 
descriptor  record  is  that  file's  "formal  name".  Since 
formal  names  are  unwieldy,  files  will  be  referenced 
by  "normal  names".  The  request  handler  will  prefix  a  path 
name  to  the  normal  name  in  order  to  derive  the  formal  name. 

Nodes  that  describe  lists  of  files  or  files  composed  of 
subfiles  will  be  specified  at  a  later  dace. 
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6.3  File  Numbers 

Files  may  be  referenced  by  any  of  three  types  of  file  num¬ 
bers.  First,  a  permanent  file  number  (PFN)  is  associated 
with  a  physical  file.  It  is  assigned  by  the  directory 
system  and  never  changes  even  if  the  file  is  renamed.  PFNs 
are  never  reused,  even  if  a  file  is  deleted.  Second,  the 
directory  file  number  (DFN)  is  a  pointer  to  the  file  node 
in  the  directory.  It  corresponds  to  a  logical  file.  It  is 
used  in  calls  to  the  directory  system,  and  to  the  storage 
manager  when  opening  a  file.  Thiru  is  the  local  file  number 
( LFN )  assigned  by  the  storage  manager.  It  is  meaningful 
only  for  open  files,  and  is  used  in  calls  to  the  storage 
manager. 
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6.4  Calls  from  the  Storage  Manager 

When  the  storage  manager  calls  the  directory  system,  it 
always  supplies  a  DFN  as  a  primary  argument,  and  a  pointer 
to  an  extent  block  (or  partial  block)  as  an  optional  argu¬ 
ment.  The  directory  system  always  returns  a  pointer  to  a 
complete,  up-to-date  extent  block  corresponding  to  that 
file.  There  are  entries  to  get  a  map  when  opening  a  file, 
allocate  more  space  within  a  file,  and  free  up  unused/unneeded 
space  wichin  a  file. 
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6.5  Calls  from  the  Request  Handler 

When  the  request  handler  calls  the  directory  system  it 
always  supplies  a  directory  pointer  to  be  used  as  the  "top” 
node.  Zero  is  used  to  specify  the  real  top.  This  is  used 
to  speed  up  directory  searches.  The  second  argument  is  the 
path  name  of  the  desired  node.  The  directory  system  always 
returns  a  pointer  to  the  up-to-date  node.  There  are  calls 
to  search,  create,  delete,  and  modify  directory  records. 
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6.6  Datalanguage-Directory  Interactions 

Several  dataianguage  commands  Interact  closely  with  the 

directory  system.  They  are  outlined  here. 

LOGOUT  causes  the  system  to  forget  everything  that  gets 
set  up  by  the  following  commands. 

ACCOUNT  specifies  a  user’s  account  number  to  the  data- 
computer . 

ATTACH  specifies  a  prefix  to  be  used  when  converting  normal 
names  to  full  names.  A  user  may  be  attached  to  several 
nodes  at  one  time.  If  so,  they  are  searched  in  order. 

LOGIN  combines  LOGOU.',  ACCOUNT,  and  ATTACH,  in  that  order. 

OPEN  specifies  a  normal  file  name  that  is  to  be  used  when 
trying  t-c  identify  a  datacomputer  name.  A  user  may  have 
several  fijes  open  at  once.  Tf  so,  the  options  supplied 
by  OPEN  determine  if  the  first  match  is  accepted,  if 
ambiguities  result  in  an  error,  or  if  all  matches  will  be 
processed . 

CLC3E  un-opens  a  file  or  all  the  user’s  open  files. 
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6 . 7  Restrictions 


Any  node  of  the  file  directory  may  contain  a  restriction 
block.  As  directories  are  searched,  the  privilege  bits 
from  an  entry  for  the  user  are  ANDed  together.  Whenever  a 
privilege  is  revoked  (by  setting  its  corresponding  bit  to 
zero)  it  is  also  revoked  for  all  nodes  below.  If  no  re¬ 
striction  block  is  present,  no  restrictions  are  revoked. 

If  a  block  exists,  but  no  entry  corresponds  to  the  user, 
all  privileges  are  revoked  and  the  search  is  aborted. 

Any  entry  in  a  privilege  block  may  contain  a  password. 

If  the  user  has  supplied  a  password,  entries  with  pass¬ 
words  are  checked  first.  There  may  be  multiple  entries 
for  a  user  with  different  passwords  to  allow  a  user  to 
have  different  privileges  at  different  times.  If  the 
password  does  not  match,  other  entries  are  checked. 

It  is  expected  that  most  directories  that  correspond  to 
projects  or  users  will  have  privilege  blocks.  Normally 
these  blocks  will  allow  read  only  access  to  most  data  by 
most  users  and  read/write  access  to  users  of  the  same 
group.  Specific  restrictions  can  be  provided  to 
restricted  users  or  groups  by  providing  the  appropriate 
password  entries. 

To  the  datacomputer ,  a  use^  is  identified  by  a  path  name 
that  corresponds  to  the  first  directory  he  is  attached  to. 
This  will  normally  correspond  to  the  group/pers^n  identifi¬ 
cation  used  by  most  time-sharing  systems.  The  " #" 
convention  for  any  node  name  will  be  necessary  for  convenient 
use  of  the  restriction  mechanism.  will  be  used  to 

indicate  all  subordinate  nodes. 
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